Inference in distributed data clustering

نویسندگان

  • Josenildo Costa da Silva
  • Matthias Klusch
چکیده

In this paper we address confidentiality issues in distributed data clustering, particularly the inference problem. We present KDEC-S algorithm for distributed data clustering, which is shown to provide mining results while preserving confidentiality of original data. We also present a confidentiality framework with which we can state the confidentiality level of KDEC-S. The underlying idea of KDEC-S is to use an approximation of density estimation such that the original data cannot be reconstructed to a given extent.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Multi-Output Adaptive Neuro-Fuzzy Inference System for Prediction of Dissolved Metal Levels in Acid Rock Drainage: a Case Study

Pyrite oxidation, Acid Rock Drainage (ARD) generation, and associated release and transport of toxic metals are a major environmental concern for the mining industry. Estimation of the metal loading in ARD is a major task in developing an appropriate remediation strategy. In this study, an expert system, the Multi-Output Adaptive Neuro-Fuzzy Inference System (MANFIS), was used for estimation of...

متن کامل

ADAPTIVE NEURO FUZZY INFERENCE SYSTEM BASED ON FUZZY C–MEANS CLUSTERING ALGORITHM, A TECHNIQUE FOR ESTIMATION OF TBM PENETRATION RATE

The  tunnel  boring  machine  (TBM)  penetration  rate  estimation  is  one  of  the  crucial  and complex  tasks  encountered  frequently  to  excavate  the  mechanical  tunnels.  Estimating  the machine  penetration  rate  may  reduce  the  risks  related  to  high  capital  costs  typical  for excavation  operation.  Thus  establishing  a  relationship  between  rock  properties  and  TBM pe...

متن کامل

Inference on Distributed Data Clustering

In this paper we address confidentiality issues in distributed data clustering, particularly the inference problem. We present a measure of inference risk as a function of reconstruction precision and number of colluders in a distributed data mining group. We also present KDEC-S, which is a distributed clustering algorithm designed to provide mining results while preserving confidentiality of o...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Breast Cancer Risk Assessment Using adaptive neuro-fuzzy inference system (ANFIS) and Subtractive Clustering Algorithm

Introduction: The adaptive neuro-fuzzy inference system (ANFIS) is a soft computing model based on neural network precision and fuzzy decision-making advantages, which can highly facilitate diagnostic modeling. In this study we used this model in breast cancer detection. Methodology: A set of 1,508 records on cancerous and non-cancerous participant’s risk factors was used.  First,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Eng. Appl. of AI

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2006